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Sh , We study the homology of simplicial complexes built via deter- 

| ^ ■ ministic rules from a random set of vertices. In particular, we show 

that, depending on the randomness that generates the vertices, the 
homology of these complexes can either become trivial as the number 
n of vertices grows, or can contain more and more complex struc- 
tures. The different behaviours are consequences of different underly- 
ing distributions for the generation of vertices, and we consider three 
illustrative examples, when the vertices are sampled from Gaussian, 
exponential, and power-law distributions in R d . 
■ We also discuss consequences of our results for manifold learning 

with noisy data, describing the topological phenomena that arise in 
this scenario as 'crackle', in analogy to audio crackle in temporal 
signal analysis. 

> 

1. Introduction. This paper treats the homology of simplicial com- 
— |- ■ plexes built via deterministic rules from a random set of vertices. In partic- 

ular, it shows that, depending on the randomness that generates the vertices, 
the homology of these complexes can either become trivial as the sample size 
grows, or can contain more and more complex structures. 

The motivation for these results comes from applications of topological 
tools for pattern analysis, object identification, and especially for the anal- 
ysis of data sets. Typically, one starts with a collection of points and forms 
some simplicial complexes associated to these, and then takes their homol- 
ogy. For example, the O-dimensional homology of such complexes can be 
interpreted as a version of clustering. The basic philosophy behind this 
attempt is that topology has an essentially qualitative nature and should 
therefore be robust with respect to small perturbations. Some recent refer- 
ences are @, S 0, BJ, B3 with two reviews, from different aspects, in [I] and 



[lo| . Many of these papers find their raison d'etre in essentially statistical 
problems, in which data generates the structures. 
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An important example occurs in the following manifold learning problem. 
Let Ai be an unknown manifold embedded in a Euclidean space, and suppose 
that we are a given a set of independent and identically distributed (i.i.d.) 
random samples X n = {X±, . . . , X n } from the manifold. In order to recover 
the homology of Ai, we consider the homology of 



(1.1) U = [jB t (X k 



k=l 

where B e {X) is the Euclidean ball, in the ambient space, of radius e about 
the point X. The belief, or hope, is that, for large enough n, the homology 
of U will be equivalent to that of Ai. A confounding issue arises when 
the sample points do not necessarily lie on the manifold, but rather are 
perturbed from it by a random amount. When this happens, it will follow 
from our results that the precise distribution behind the randomness plays 
a qualitatively important role. It is known that if the perturbations come 
from a bounded or strongly concentrated distribution, then they do not 
lead to much spurious homology, and the above line of attack, appropriately 



applied, works. For example, it was shown in 15] that for Gaussian noise it 
is possible to clean the data and recover the underlying topology of Ai in a 
way that is essentially independent on the ambient dimension. Both 14, Tj^ 
contain results of the form that, given a nice enough Ai, and any 5 > 0, 
there are explicit conditions on n and e such that the homology of U is equal 
to the homology of Ai with a probability of at least (1 — 5). However, for 
other distributions no such results exist, nor, in view of the results of this 
paper, are they to be expected. 

Figure [T] provides an illustrative example of what happens when sampling 
points from an annulus and perturbing them with additional noise before 
reconstructing the annulus as in (|l.lj) . In particular, it shows that if the ad- 
ditional noise is in some sense large then sample points can appear basically 
anywhere, introducing extraneous homology elements. 

In order to be able, eventually, to extend the work in [l5| beyond Gaussian 
noise, and make more concrete statements about the probabilistic features 
of the homology this extension generates, it is necessary to first focus on the 
behaviour of samples generated by pure noise, with no underlying manifold. 
In this case, thinking of the above setup, the manifold Ai is simply the 
point at the origin, and the homology that we shall be trying to recapture is 
trivial. Nevertheless, we shall see that differing noise models can make this 
task extremely delicate, regardless of sample size. 





(a) 




(c) 




Fig 1. (a) The original space M (an annulus) that we wish to recover from random 
samples, (b) With the appropriate choice of radius, we can easily recover the homology of 
the original space from random samples from M. (c) In the presence of bounded noise, 
homology recovery is undamaged, (d) In the presence of unbounded noise, many extraneous 
homology elements appear, and significantly interfere with homology recovery. 



1.1. Some sample results. To start being more concrete, let 

be a set of n i.i.d. random samples in M. d , from a common density function /. 
Recall that the abstract simplicial complex C(X,e) constructed according 
to the following rules is called the Cech complex associated to X and e: 

1. The O-simplices of C(X,e) are the points in X, 

2. An n-simplex a = [xi , ... , Xi n ] is in C(X, e) if f]^ =0 B Xik (e) / 0, 

An important result, known as the 'nerve theorem', links Cech complexes 
and the neighborhood set U of (jl.ip . establishing that they are homotopy 
equivalent (cf. 0]). In particular, they have the same Betti numbers, mea- 
sures of homology that we shall concentrate on in what follows. 

If the sample distribution has a compact support S, then it is easy to 
show that, for large enough n, 



C(X,s) ~ \jB e (X k ) « Tube(,S,e) — {x G 



k=l 



mm \\x 
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where ~ denotes homotopy equivalence and ||-|| is the standard L 2 norm 
in M. d . Thus, there is not much to study in this case. However, when the 
support of the distribution is unbounded, interesting phenomena occur. 

To study these phenomena, we shall consider three representative exam- 
ples of probability densities. These are the power-law, exponential, and the 
standard Gaussian distributions, whose density functions are given, respec- 
tively, by 

(1.2) f P (x) 



i + Ml 



(1.3) /e(x)^c c e-M, 

(1.4) / g (x) 4 Cge HN 2 / 2 , 

where a > d and c p , c e , c g are appropriate normalization constants that will 
not be of concern to us. 

For large samples from any of these distributions we shall show that there 
exists a 'core' - a region in which the density of points is very high and so 
placing unit balls around them completely covers the region. Consequently, 
the Cech complex inside the core is contractible. The size of the core obvi- 
ously grows to infinity as the sample size n goes to infinity, but its exact size 
will depend on the underlying distribution. For the three examples above, if 
we denote the radius of the core by R^, we shall prove in Section [2.11 that 



R c ~ ( 



' (n/ log n) x l a f(x) oc , 
logn f(x) oc e - !'*'!, 

k \J2 log n f(x) oc e^'l^l 2 / 2 . 



Note that in all three cases we have tacitly assumed that the cores are balls, 
a natural consequence of the spherical symmetry of the probability densities. 

Beyond the core, the topology is more varied. For fixed n, there may be 
additional isolated components, but no longer enough placed densely enough 
to connect with one another and to form a contractible set. Indeed, we shall 
show that the individual components will typically have enough homology to 
be, individually, non-contractible. Thus, in this region, the topology of the 
Cech complex is highly nontrivial, and many homology elements of different 
orders appear. We call this phenomenon 'crackling', akin to the well known 
phenomenon caused by noise interference in audio signals and commonly 
referred to as crackling. 

As for core size, the exact crackling behaviour depends on the choice of 
distribution. It turns out that Gaussian samples do not lead to crackling, but 
the other two cases do. To describe this, with some imprecision of notation 
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we shall write [a, b) not only for an interval on the real line, but also for the 
annulus 

[a, b) = {x E R d : a < \\x\\ < b}. 

In Sections 12.21 and 12.31 we shall show that the exterior of the core can be 
divided into disjoint spherical annuli at radii 

R C n <C Rd-l,n Rd-2,n "C • • • <C Ro,n 

(defined differently for each of the two crackling distributions) with different 
types of crackling (i.e. of homology) dominating in different regions. 
In [i? , 

n ,oo) there are mostly disconnected points, and no structures with 
nontrivial homology. In [R\ >n i Ro,n) connectivity is a bit higher, and a finite 
number of 1-cycles appear. In [R2,n,Ri,n) we have a finite number of 2- 
cycles, while the number of 1-cycles grows to infinity as n — > oo. In general, 
in [Rk,n, Rk-i,n), as n — > oo we have a finite number of &;-cycles, infinitely 
many /-cycles for I < k, and no cycles of dimension I > k. In other words, 
the crackle starts with a pure dust at R n p and as we get closer to the 
core, higher dimensional homology gradually appears. See Figure [2] in the 
following section for more details. 

As we already mentioned, the Gaussian distribution is fundamentally dif- 
ferent than the other two, and does not lead to crackling. In Section 12.41 
we show that, for the Gaussian distribution, there are hardly any points 
located outside the core. Thus, as n — > oo, the union of balls around the 
sample points becomes a giant contractible ball of radius of order yJ2 log n. 

It is now possible to understand a little better how the results of this paper 
relate to the noisy manifold learning problem discussed above. For example, 
if the distribution of the noise is Gaussian, our results imply that if the 
manifold is well behaved, and the sample size is moderate, noise outliers 
should not significantly interfere with homology recovery, since Gaussian 
noise does not introduce artificial homology elements with large samples. 
However, there is a delicate counterbalance here between 'moderate' and 
'large'. Once the sample size is large, the core is also large, and the recon- 
structed manifold will have the topology of M. © B Q ^ 2 log n )(Q)> w here © 
is Minkowski addition. As n grows, the core will eventually envelope any 
compact manifold, and thus the homology of M will be hidden by that of 
the core. 

On the other hand, if the distribution of the noise is power-law or ex- 
ponential, then noise outliers will typically generate extraneous homology 
elements that, for almost any sample size, will complicate the estimation 
of the original manifold. Furthermore, increasing the sample size in no way 
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solves this problem. Note that this issue is in addition to the fact that in- 
creasing the sample size will, as in the Gaussian case, create the problem of 
a large core concealing the topology of M. 

Thus, from a practical point of view, the message of this paper is that 
outliers cause problems in manifold estimation when noise is present, a fact 
well known to all practitioners who have worked in the area. What is qualita- 
tively new here is a quantification of how this happens, and how it relates to 
the distribution of the noise. We do not attempt to solve this problem here, 
but unfortunately it follows from the results of this paper that algorithms for 
handling outliers will probably involve knowing at least the tail behaviour of 
the error distribution, despite the fact that in practical situations one does 
not generally want to take as known prior knowledge. 

1.2. On persistence intervals. While the above discussion has concen- 
trated on the persistence of noise induced crackle as sample sizes grow, and 
the regions in M. d in which different types of homology appear, the proofs 
below also yield information about the more classical persistence diagrams 
of topological data analysis (cf. 

For example, in the two cases for which crackle persists - the power-law 
and exponential cases - estimates of the type appearing in Section [3] indicate 
that, with high probability, there exist extremely long bars in the bar code 
representation of persistent homology. Up to lower order corrections, pre- 
liminary calculations show that bar lengths for the k-th homology can be as 
large as 0{n ak ) for the power-law case, and f3^ (log log n) for the exponential 
case, for appropriate and fi^. More detailed studies of these phenomena 
will appear in a later publication. 

1.3. Poisson Processes. Although we have described everything so far in 
terms of a random sample X of n points taken from a density /, there is 
another way to approach the results of this paper, and that is to replace the 
points of X with the points of a cf-dimensional Poisson process V n whose 
intensity function is given by A n = nf. In this case the number of points is 
no longer fixed, but has mean n. 

All the results of this paper stated for X hold, without any change, if we 
replace X by V . 

1.4. Disclaimers. Before starting the paper in earnest, and so as not 
to be accused of myopia, we note that the subject of manifold learning is 
obviously much broader that that described above, and algorithms for 'esti- 
mating' an underlying manifold from a finite sample abound in the statistics 
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and computer science literatures. Very few of them, however, take an alge- 
braic point of view that we or the literature quoted above take. Furthermore, 
we note that other important results about the homology of Rips and Cech 



complexes for various distributions can be found in the papers [11|, ll2j and 



[3] ■ However, the methods and emphases of these papers are rather different 

2. Results. In this section we shall present all our main results, along 
with some discussion, more technical than that of the Introduction. Recall 
from Section 11.31 that although we present all results for the point set X, 
they also hold if we replace the points of X by the points of an appropriate 
Poisson process. All proofs are deferred to Section [3l 

2.1. The Core of Distributions with Unbounded Support. We start by ex- 
amining the core of the power-law, exponential and Gaussian distributions. 
These distributions are spherically symmetric and the samples are concen- 
trated near the origin. By 'core' we refer to a centered ball Br u = Br u (0) C 
M d containing a very large number of points from the sample X n , such that 

B Rn C |J B 1 (X). 

xex n r\B Rn 

i.e. the unit balls around the sample points completely cover Bn n . In this 
case the homology of \JxeX n nB Rn or equivalently, of C(X n nB Rn ,l), 

is trivial. Obviously, as n — > oo, the radius R n grows as well. 

Let {-Rnj^Li be an increasing sequence of positive numbers. Define by C n 
the event that Bn n is covered, i.e. 



C n ^\B Rn c |J B 1 {X)\. 
[ xex n nB Rn J 

We wish to find the largest possible value of Rn such that P (C n ) —> 1. The 
following theorem presents lower bounds for this value. 

Theorem 2.1. Let e > 0, and define 



i) ' f = f 

(logn— e~ E loglogn) J ■> ■'V 1 

log n - log log log n - 5 e - e f = f e , 

, \l 2 (log n - log log log n-8 g -e) f = f g , 
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where the three distributions are given by (jl.2p ~ (jl.4p . and 

5 p = c p a2- d d-( 1+d ' 2 \ 

S c = (1 + <2/2) log d + <21og2 - logc e , 

J g = (l + d/2)logd+ (d- l)log2-logCg. 

IfRn < R c n , then 

Theorem 12,11 implies that the core size has a completely different order of 
magnitude for each of the three distributions. The heavy-tailed, power-law 
distribution has the largest core, while the core of the Gaussian distribution 
is the smallest. In the following sections we shall study the behaviour of the 
Cech complex outside the core. 

2.2. How Power-Law Noise Crackles. In this section we explore the crack- 
ling phenomenon in the power-law distribution / = f p . Let B^ n C M. d be 
the centered ball with radius R n , and let 

(X = c(*nn(B*J c ,i), 

be the Cech complex constructed from sample points outside Bj^ . We wish 
to study 

Ph,n — fik{C n ), 

the fc-th Betti number of C n . 

Note that the minimum number of points required to form a A;-dimensional 
cycle (k > 1) is k + 2. For k > 1 and y C R d , denote 

T k (y) 4 i {\y\ = k + 2, f3 k (C(y, i)) = 1} , 

i.e. Tfc takes the value 1 if C(y, 1) is a minimal /c-dimensional cycle, and 
otherwise. This indicator function will be used to define the limits of the 
Betti numbers. 

Theorem 2.2. If lim^oo nR~ a = 0, then 

lim (nR d n - Q ) _1 E{/3 ,n} = M Pl o, 
lim (n fc+2 ^- Q ( fc+2 )) _1 E{/3 fc , n } = Mp , fc , 1 < fc < d - 1 
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where 



(2-1) M P ,o 
( 2 - 2 ) Mp.fc 



A_ lCp 

a — d ' 



T fc (0,y)dy, l<fc<d-l, 



{a{k + 2) - d){k + 2)\ J {R d )k +i 
and where Sd-i is the surface area of the (d— 1)- dimensional unit sphere in 



Next, we define the following values, which will serve as critical radii for 
the crackle, 

±L 0,n 11 i 
lMS,n — -0-0, n) 

R% n A n ( Q - d /W2) +e ) 5 (jfe>l) 
Rk,n — Rk,n- 

The following is a straightforward corollary of Theorem 12. 21 and summarizes 
the behaviour of E{/3fc jn } in the power-law case. 



Corollary 2.3. For k > and e > 0, 
lim E = < 



R n = R e kn , 



Theorem 12.21 and Corollary 12.31 reveal that the crackling behaviour is 
organized into separate 'layers', see Figure [2j Dividing M. d into a sequence 
of annuli at radii 



we observe a different behaviour of the Betti numbers in each annulus. We 
shall briefly review the behaviour in each annulus, in a decreasing order of 
radii values. The following description is mainly qualitative, and refers to 
expected values only. 

• [Rq n , oo) - there are hardly any points (f3k ~ 0, < &; < d — 1). 

• [i?o,n> R*o n) - points start to appear, and /?o ~ ^p,o- The points are very 
few and scattered, so no cycles are generated ~ 0, 1 < k < d — 1). 
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• [R\ n , Ro t n) - the number of components grows to infinity, but no cycles 
are formed yet (/3q ~ oo, and = 0, 1 < k < d — 1). 

• [Ri <n , R\ n ) - a finite number of 1-dimensional cycles show up, among 
the infinite number of components (/?o ~ oo, f3\ ~ Mp,li and = 0, 
1 < k < d- 1). 

• n> ^l,n) - we have f3o ~ oo, (3i ~ oo, and ~ for > 1. 
This process goes on, until the {d — l)-dimensional cycles appear - 

• [Rd-i, R e d _i) - we have ~ /^p,d-i and ~ oo for < k < d — 2. 

• [Rn,Rd-i) - just before we reach the core, the complex exhibits the 
most intricate structure, with ~ oo for < k < d — 1. 

Note that there is a very fast phase transition as we move from the con- 
tractible core to the first crackle layer. At this point we do not know exactly 
where and how this phase transition takes place. A reasonable conjecture 
would be that the transition occurs at R n = n}l a (since at this radius the 
term nR~ a that appears in Theorem 12.21 changes its limit, affecting the 
limiting Betti numbers). However, this remains for future work. 




Fig 2. The layered behaviour of crackle. Inside the core (Br^) the complex consists of a 
single component and no cycles. The exterior of the core is divided into separate annuli. 
Going from right to left, we see how the Betti numbers grow. In each annulus we present 
the Betti number that was most recently changed. 

2.3. How Exponential Noise Crackles. In this section we focus on the 
exponential density function f = f e . The results in this section are very 
similar to the those for the power law distribution, and we shall describe 
them briefly. Differences lie in the specific values of the Rk, n and in the 
terms in the limit formulae. 

Theorem 2.4. If lim^oo ne~ Rn = 0, then 

lim (ni^-VM -1 E{/3 .„} = ^e.o, 
lim (n k+2 R d n - 1 e-( fc + 2 ) fl ")~ 1 E{/3 fc . n } = Me ,ib k > 1 
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where 
(2.3) 

fJ-efl — Sd-lCe, 

(2.4) 

„fc+2 roc 



k-\-2> /*oo p k-\-\ 

Me,* ^ / / liCCyJe-a^Sfe 1 ^) J] 1 {yj > _„} rfydp, 

(fe + 2)! y y (R d )fc +i jl± 



and where yj is the first coordinate of yi E 
Next, define 



R o,n = log n + (d - 1 + e) log log n, 
r A pO 



^fe,n-log™+ ^^q-^ + e^) loglogn, (fc>l) 

p A oO 

From Theorem 12.41 we can conclude the following. 
Corollary 2.5. For k > and e > 0, 



lim E{/3 fc , n } 



i? n — R% n , 

Me, A; — Rk,m 

OO = i?7 



As in the power-law case, Theorem 12.41 implies the same 'layered' be- 
haviour, the only difference being in the values of Rj. n . From examining the 
values of R^, and Rf~ n it is reasonable to guess that the phase transition in 
the exponential case occurs at R n = logn. 

2.4. Gaussian Noise Does Not Crackle. Simplicial complexes built over 
vertices sampled from the standard Gaussian distribution exhibit a com- 
pletely different behaviour to that we saw in the power-law and exponential 
cases. Define 

R e , n — ^21ogn + (d-2 + e)loglogn, 

then 
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Theorem 2.6. If f = f g , e> 0, and R n = R e Q n , then forO<k<d-l 

lim E{/3 fc>n } = 0. 

n— >oo 

Note that in the Gaussian case linin^oo (i?g,n ~~ Rn) = 0- This implies 
that as n — > oo we have the core which is contractible, and outside the core 
there is hardly anything. In other words, the ball placed around every new 
point we add to the sample immediately connects to the core, and thus, the 
Gaussian noise does not crackle. 



3. Proofs. We now turn to proofs, starting with the proof of the main 
result of Section 12.11 



3.1. The Core. 



Proof of Theorem 12.11 The proof covers all three distributions, ex- 
cept for specific calculations near the end. Take a grid on M. d of size g = 
Let Q n be the collection of cubes in this grid that are contained in B^ n . Let 
C n be the following event 

C n = {VQ g Qn ■ Q n x n ^ 0} , 

i.e. C n is the event that every cube in Q n contains at least one point from 
X n . Recall the definition of C n , 



r< A 



B Rn c |J BtiX)}. 

xex n nB Rn 



Then it is easy to show that C n C C n . The complementary event is the 
event that at least one cube is empty. Thus, 

nc c n )< E ^(Qn* n = 0)= £)(l-p(Q)) n < £ e~ n ^ 

QeQu QeQn QeQ n 

where 

p(Q)= f f(z)dz>g d f(R n ). 
JQ 

In addition, the number of cubes that are contained in Bn n is less than 
(2R n /g) d . Therefore, 

(3.1) P(C£) < {2g- 1 ) d R d n e- n9d ^ Rn) . 
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Now, choose any e > and set 

{( 5 v n lV^ f=f 

\ (log n— e~ E log log n) / •> 'P' 

log n — log log log n — 5 e — e f = f e , 

^2 (log n - log log log n - 5 g - e) / = / g , 

where 

<5 p = c p a2- d d-( 1+d / 2 ), 

4 = log d - log c e - log g d , 

5 g = log(d/2) - log c g - log g d . 

It is easy to verify that in all cases we have 

R d e -ng a f(R n ) ^ Q 

Thus, from d33J we conclude that P(<7„) -»• 1. Since P(C n ) > P(C n ) we 
now have that for R n = R^, in each of the distributions, 

P(C n )^l, 

which completes the proof. □ 
3.2. Crackle - Notation and General Lemmas. For R n > 0, set 

X n,Rn — X n ("I (B Rn ) c , 

i.e. A'n,^ consists of the points of located outside the ball -Br„. Next, 
recall the definition of 

r fe (;y) = i {l^l = k + 2, (3 k (c(y, i)) = i} , 

for y C M d , and write 
<5o,n — # {X G Xn,R n '■ X is a connected component of C(X n , 1)} 

'S'fe.n — 7fc(3^)l {6(3^, 1) is a connected component of C(X n , 1)} , 

L kjn = l{\y\ = k + 3, C(y, 1) is connected} , 
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where k > 1. Observe that 

(3.2) So.ra <A),n < Sb,n 

(3-3) Sfc )n </3 fcjn < 5fc jn + L k>n , k > 1 

We will evaluate the limits of EjSfc^}, E-fSfc^} and E{Lj. ;n } and deduce 
from these the limit of E{/3fc in }. 
In addition, set 

ei = (1,0,..., 0) € R d , 
/(r) = /(r ei ), rel, 



U(x)±\jB 2 (x t ), xe(Rt, 
i=i 

p(x) 4 / /(*)cfe, x G (M d ) fe . 



The following two lemmas are purely technical, but will considerably simplify 
our computations later. 

Lemma 3.1. Let f : M. d — > R 6e a spherically symmetric probability 
density. Then, 



poo 

E{S ,n} = s d - in r d - l f{r)dr, 

J Rn 

poo 

E{£ ,n} = Sd_m / r d - l f{r){\ - up^e^'h 

J Rn 



' Rn 

where s^-i is the volume of the d — 1 dimensional unit sphere. 
Proof. So,n is simply a sum of Bernoulli variables, therefore 

E{S ,n} = nP(||X|| >R n ) = n f /(x)l{||x|| >R n }dx. 

Writing the integral in polar coordinates yields 

poo r 

E{S ,n} = n / f(r0)r d - 1 J(0)dedr, 

JRn JS d - X 

where J (6) = ||||. Since / is spherically symmetric, f(r6) = f(r), and 



I 86 I 

therefore 

poo 

E{S , n } = s d _ 1 n r d - l f{r)dr 

J Rn 



15 



The proof for So,n is similar, using the fact that the probability that a point 
x 6 W d is disconnected from the rest of the complex C(X n , 1) is (1— p(x)) n ~ 1 . 

□ 

Lemma 3.2. Let f : M d — > R 6e a spherically symmetric probability 
density. Then, for k > 1, 

E{S fc , n } = Sd-l^" 2 ) j™ r d - x f(r)G k (r)dr, 

roo 



where s^-i is the volume of the d — 1 dimensional sphere, and where 

n fc+1 

G fc (r)^ / /([|rei+y[|)r fc (0,y)TTl{lkei + W ||>fl„}cfy, 
G fc (r)^ / /([|rei+y[|)r fc (0,y)TTl{lkei + W ||>fl„} 



8=1 

n-k-2. 



x (1 -p(rei,rei +y)) n fc 2 dy. 

Proof. The proof is in the same spirit of the proof of Lemma l3.1j, but 
technically more complicated. Thinking of n as a sum of Bernoulli vari- 
ables, we have that 

E{5 fc ,„}= ( " ] / /(x)r fe (x)TTl{||x i || >i? n }dx. 

V fc + 2 / i(R d ) fe + 2 fj[ 

Let Ifc denote the integral above. Then, using the change of variables 

xi -)■ x, Xj-^x + yj_i, (i > 1), 

yields 

fc+i 



h= / /(x)/(x + y)T fc (x,x + y) rri{||x + yi || > 

J||x||>i? n J(Rd)fc+i ^ 
PC h-\-\ 

= / /(x)/(x + y)r fc (0,y)ni{||x + yj || > i^} dyda;. 
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Moving to polar coordinates yields 

h= / /(r0)/(r0 + y)T fc (O,y) 

JRn JS^ 1 J(R d ) k + 1 
k+l 

x ni{H + W || > R n }r d - 1 J(d)dyd9dr 
i=i 

= r^fir) J(9) f(\\re + y\\)T k (0,y) 



k+l 

x JJl{||r0 + y;|| >R n }dyd8dr, 
i=i 

I 9a; I 



where J{6) = ||jf |, and f(x) = /(||x||) by the spherical symmetry assump- 
tion. Set 

n k + l 

G k (r,0)± / /(H + y||)T fc (0,y)ni{H + j/i|| >Rn}dy. 

Since T k is rotation invariant, it is easy to show that for every 9 G S^ 1 

G k (r,0) = G k (r,e 1 )±G k (r). 

Thus, 

POD 

(3.4) 4 = *d-i / r d -V(r)G fc (r)dr. 

This completes the proof for S k>n . The proof for S kjn is similar. 

In what follows, we shall use the following elementary limits: 
1. For every k > 0, 



□ 



(3.5) lim n 



-k 



k) ~~k\ 



2. For every sequence a n — >■ and k > 0, 

(3.6) lim (1 ~ ° n)n ~ fc = 1 

n^oo e ncin 
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3.3. Crackle - The Power Law Distribution. In this section we prove the 
results in Section [2.2i First, we need a few lemmas. 

Lemma 3.3. If f = f p , and R n — > oo, then 

lim (nRt a Y 1 E{S , n } = fi pfi , 



where p p ^ is defined in (I2.ip . 
//, in addition, nR~ a — > 0, then 



lim (nR d n - a ) 1 E{So, n } = /i p , . 



n— >oo 



Proof. From Lemma 13. II we have that 

/•oo 

E{S , n } = s d „ in r d ~ l f{r)dr. 

J Rn 

Making the change of variables r — > R n p yields 

roc d— 1 

= s A -\c p nR d ~ a / — ^ dp. 

P " Jl Rn a +P a 

Applying the dominated convergence theorem to the previous integral gives 
lim (nR d - a Y l JL{So. n } = s rf -ic p H p^^dp = S -^R = ^ 



This proves the first part of the lemma. 
Next, from Lemma 13. II we have that 

/■OO 

E{S ,„} = s d ^n / r d -V(r)(l - p^T^dr. 

J R n 

The power term is bounded by 1 and therefore will not affect the condi- 
tions needed for dominated convergence. Thus, using (|3.6p . we only need to 
evaluate its limit. 



p(rei) = / f(z)dz = / ——r ■ — -dz, 

JS 2 (rei) JB 2 (0) 1 + Irei +Z\\ 



and after the change of variables r — > R n p we have, 

1 



p(R n pei) = c p R n a I 
Jl 



B 2 (0) Rn a + llpei + -Rn^lT 



dz. 



18 ADLER, BOBROWSKI, WEINBERGER 

If nR~ a — > 0, then, by dominated convergence, we have 

lim np{R n pe\) = 0. 

n— >oo 

Thus, 

lim (1 -p( J R n/ oei)) n - 1 = lim e -np(H»/»i) = 1; 

n— >oo n— >oo 

and therefore we have 

lim (nR^Y 1 E{S ,n} = lim ~* E {5 ,„} = /i P)0 . 

n— »oo \ / n— >oo \ J 

This completes the proof of the second part of the lemma. □ 
Lemma 3.4. If f = f p , and R n — > oo then 

lim (n fc + 2 i?f a ( fc + 2 )) _1 E{5 fc ,„,} = /i Pjfe , 

n— »oo \ / 

where is defined in (|2.2p . 7/, in addition, nR~ a —> 0, i/ien 
lim (n fe+2 i?f a ( fc+2 )) _1 E{5 fe , n } = M P)fc . 

Proof. The proof is in the spirit of the proof of Lemma 13.31 but techni- 
cally more complicated. From Lemma 13.21 we have that 

^{Sk,n} = (, H n )h, 



k + 2 
where 

/•oo 

h = Sd-i / r d - l f{r)G k {r)dr. 

J Rn 

Making the change of variables r — > R n p yields 

/oo 
(R n p) d - 1 f(R n p)G k (R n p)dp 

P V ' Jl J(R^Rn a + p a l\^ 



+ llpei + Rn 1 y i \\ a 



k+l 

T fc (0,y)[]l{||pe 1 + J R- 1 y l || > l} dy. 

i=l 
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Thus, using (13. 5p . 



s d -ic k p +2 f°° f 



(fc + 2)! A 7 (R d )fc+ i Rn a + p a 



fc+i 1 

xT fc (0,y) J] ijllpei + i^VH >l}rfy- 

f = l -Rn + ||pei +Rn Vi || 

It is easy to show that the integrand is bounded by an integrable term, so 
the dominated convergence theorem applies, yielding 

lim (n fc+2 <- Q ( fc+2 ))- 1 E{5 fcn } 
n— >-oo 

Jc+2 /-do 



= ^%T I™ P^^dp ( T fc (0,y)dy 
~ (a(A; + 2)-d)(A; + 2)! / (Rd)fc+ i Tfc( °' y)dy 

= Pp,k- 

This proves the first part of the lemma. 

Next, the terms Gfc( r ) an d Gk( r ) i n Lemma 13.21 differ only by the term 
(1 — p(re\,re\ + y)) n_fc_2 , so dominated convergence still applies. Now, 

p(rei,rei + y) = / f(z)dz= / f{re l + z)dz, 

JU(rei,rei+y) JU(0,y) 

and substituting r — > R n p yields, 

1 



p(R n p%x,R n p%\ + y) = c p R n a \ 

Jv 



U(0,y) Rn a + llpei + iin^ir 



dz. 



If nR n a — > 0, then using the dominated convergence we have 
lim np(Rnpei, R n pei + y) = 0. 

n— >oo 

Thus, 

lim e - n P( R npei,R„pe 1 +y) _ ^ 

n— >oo 

and therefore, using (13. 6p . 

lim (n k+2 R?r a ( k+2 A~ 1 E{S kn }= lim ( n k+2 R d ~^ k+ ^ \~ X E{S k „,} 

= Pp,k- 

This completes the proof of the second part of the lemma. □ 
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Lemma 3.5. If f = f v , and R n — > oo then 

lim (n k+3 Ri- a ( k+3 A~ 1 E{L k , n }=ji Pik , 

for some fi P} k > 0. 

Proof. The proof is very similar to the proof of Lemma 13.41 We need 
only replace T k with an indicator function that tests whether a sub-complex 
generated by k + 3 points is connected. The exact value of p, Pi k will not be 
needed anywhere. □ 



We can now prove Theorem 

Proof of Theorem 12 .21 To prove the limit for /3 0jn simply combine 
Lemma 13.31 with the inequality (|3.2p . To prove the limit for fik,n, k > 1, 
combine Lemmas 13.41 and 13.51 with the inequality (13.3P . □ 



3.4. Crackle - The Exponential Distribution. In this section we wish to 
prove Theorem 12.41 We start with the following lemmas. 

Lemma 3.6. If f = f e , o,nd R n — > oo then, 

lim fn<- 1 e-M" 1 E{S'o J n} = /ie,o, 



where /i e ,o is defined in (I2.3p . 
If, in addition, ne~ Rn — > then, 



lim (ntffVM 1 E{S ,n} = /M- 



Proof. From Lemma 13. II we have that 

/"OO 

E{S , n } = s d _ in r d - l f{r)dr. 

J Rn 

Using the change of variables r — > p + R n yields 

roc 

E {So,n} = Sd -xn / (p + R n ) d - l c c e-^ +R ^dp 
Jo 



s d _ lCc nRi 1 e Rn / 
Jo 
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Applying dominated convergence to the last integral yields, 

lim (nR*~ 1 e~ Rn ) E{£ ,n} = s d -ic c [ e~ p dp = s rf _ic e = /x e .o- 

n— too \ / Jq 

This proves the first part of the lemma. 
Next, from Lemma 13. II we have that 

/•oo 

J Rn 

The power term will not affect the dominated convergence conditions. Thus, 
we only need to evaluate its limit. 



c e e- ||r0l+z|l (iz, 

2 (reO JB 2 (0) 



p(rei) = / f(z)dz = I 
■JB 2 (rei) JB; 

and after the change of variables r — )■ p + -Rn we have, 

p((p + R n )e 1 )= f c c e~^ p+Rn)ci+zll dz<e- (Rn+p) [ cj^dz. 

If ne~ Rn — > 0, then 

lim np((/9 + i? n )ei) = 0. 

n— i>oo 

Thus, 

lim e - np « p+R " )ei) = 1, 
and therefore, using (j3.6|) . we have 

lim (nld^e' 11 "-)' E{S ,n}= lim (n^VM ~ E{S ,„} = m 

n— too \ J n— too \ J 



e,0- 



This completes the proof of the second part of the lemma. □ 
Lemma 3.7. If f = f e , and R n — > oo then, 

lim (n fe + 2 i^V( fe+2 )M -1 E{<W =»e,k, 



where /i e ,fc is defined in (|2.4p . 
7f, m addition, ne~ Rn — > i/ien, 



lim L^Rt'e-^A^EiS^} = M e,A, 

n— >oo \ J 
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Proof. From Lemma 13.21 we have that 

E{Sk ' n} = JkTW Ik > 

where 

/•oo 

h = s d -i / r d - l f(r)G k (r)dr. 

J Rn 

Making the change of variables r — > p + R n yields 

poo 

h = / (P + Rn) d - l f{p + Rn)G k ( P + i? n )dp 

JO 

poo p 

/ / ( p + jRn )rf-l e -(p+«n)TT e -||(p+i?n)c 1+K |l 

Jo 7(R d ) fc +! 
fc+1 

T fc (0, y) fl 1 {||(p + ^n)ei + 2/i|| > i?n} dydp 

i=l 

JO J(R d ) ft + 1 V-^n / 

fc+1 

r fc (0,y) [1 e -ll(P+^+»ll e ^l {||( p + i? n ) ei + > R n } dydp. 



Sd-lC, 



The last integral can be easily shown to satisfy the conditions of the domi- 
nated convergence theorem. In addition, it is easy to show that 

lim e _ ll(' ,+ ' R ») e i+^ll e - R " = e -(p+( e i*>) = e -(p+vl) 

n— >oo 

where y\ is the first coordinate of i/; £ M^, and also that 

lim l{\\(p + Rn)ei +Vi\\> Rn} = 1 {y} > -p) ■ 

n— Yoo 

Altogether, we have that 
lim ( n^Rt'e-^^Y 1 E {S k n } 

= Tkfkr / / T fc (o, y )e-(^+^) n i {vl > -p) dydp, 

(k + 2)\ J J {R d )k+ i a± 
proving the first part of the lemma. 
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Next, as in the proof of Lemma l3.44 we need to evaluate the termp(rei, re±+ 

y). 



U(fl,y) JU(0,y) 



p(rei,rei + y) = / c e e ^ rei+ ^dz < \ 

JU(0,y) JU 

The change of variables r — > p + i?n yields 

p((p + R n )e 1 ,( y p + R n )e 1 +y)<e- R "e- p / ce^tfe. 

J(7(0,y) 

If ne~ Rn — ► 0, then 

lim nj?((p + i? n )ei, (p + R n )ei + y) = 0. 

n— >oo 

Thus, 

lim e - n P( R nPei,R n pei+y) _ j 
n— >oo 

and therefore, 

lim (n k+2 Rt l e-V 1+2)Rn Y 1 E{S kn } 

= lim (n fc + 2 J Rr i e- (fc+2)R ") _llE i 5 M} = ^. 
This completes the proof. □ 
Lemma 3.8. If f = f e , and R n — >• oo i/ien 

lim (n k+3 Rt V^M^ELZ^} = ^ fc . 

where > 0. 

Proof. As for the proof of Lemma 13.51 niimic now the proof of Lemma 
13.71 replacing with an indicator function that tests whether a sub-complex 
generated by k + 3 points is connected. □ 

Proof of Theorem 12.41 The proof follows the same steps as the proof 
of Theorem O □ 
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3.5. Crackle - The Gaussian Distribution. In this section we prove The- 
orem I 



Proof of Theorem 12.61 From Lemma [3TT1 we have that 

/"OO 

E{S , n } = s d - in r d - l f{r)dr. 

J Rn 

Making the change of variables r — > (p 2 + R^) 1 ' 2 which implies dr 
-rrpidp, we have 



{p 2 +Rl) 1/2 

E{So,n} = s d ^c g ne- R2 ^ / (p 2 + R 2 ^! 2 pe^ l 2 dp 

Jo 



Sd-ic g ne 



- R2 ^Rt 2 [ ((p/R n f + l) id - 2)/2 pe-^dp. 



The integrand is bounded, and applying dominated convergence we have 
lim (ne- i? '/ 2 J Rf 2 ) _1 E{5 , n } = s d ^c g . 

n— >oo \ J 

Taking R n = R e Qn ± ^2 log n + (d - 2 + e) log log n, we have 

e -i£/2 =n -l (logn) -(d-2 +e) /2 

and so 



lim ne~ Rj2 R d ~ 2 = 

n— >oo 

which implies that 

1{<V}^0. 
Finally, for every < k < d — 1, 

Pk,n < 5o,n- 

Therefore, 

lim E{/3 fe , n } = 0, 

n— >oo 

completing the proof. □ 
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